[00:22.840 --> 00:28.480]  Cool. Thank you very much. And I really appreciate being here, and I really appreciate your time.
[00:28.480 --> 00:32.080]  So I'm more than happy to answer any of your questions. I'm looking mostly at the Discord
[00:32.080 --> 00:37.000]  channel right now. My beard is awesome, and I'm not on a boat, but I wish I was.
[00:37.320 --> 00:43.340]  But just kind of... I didn't plan this out here, but you gotta respect the beard here.
[00:43.380 --> 00:50.520]  So on more serious questions here, I guess, what can I do to help answer?
[00:53.020 --> 01:00.300]  And so Markov, it really has very little math actually in it, too. It's just multiplication.
[01:00.380 --> 01:05.980]  So if you can do multiplication, there's not even any division. So you just look at the
[01:05.980 --> 01:09.660]  probability of one letter falling after another letter, falling after another letter there.
[01:09.700 --> 01:15.720]  And the more letters you have, the higher probability it usually is. Omin gets a little
[01:15.720 --> 01:21.140]  bit more complicated because they have this idea behind it called levels. But then you do it with
[01:21.140 --> 01:24.880]  some multiplication, and you just do addition instead. So the final level, the password,
[01:24.880 --> 01:28.820]  is summing all the levels across it. And the reason why you want to do that is because that's
[01:28.820 --> 01:33.780]  much faster than trying to do a more traditional Markov attack like you would see in John Ripper
[01:33.780 --> 01:38.620]  Markov mode. But there's a million different ways that you can do Markov. So I could talk
[01:38.620 --> 01:41.840]  to you all day about that if we want to just go ahead and get into that.
[01:43.860 --> 01:48.900]  Hashcat needs to update its Markov ability, though, because it is probably, of all the
[01:48.900 --> 01:57.750]  algorithms, the worst. But Hashcat's really fast, so it can get away with that. And I know you're
[01:57.750 --> 02:01.470]  just trying to be cheeky about that, but here's a... just trying to give you an answer to that,
[02:01.470 --> 02:20.150]  because I really enjoy it. It would be nice... the challenge is that the guest generation
[02:20.150 --> 02:29.370]  port could probably be extracted out, but figuring out what kind of rule to use is pretty
[02:29.370 --> 02:35.810]  much single-threaded, unfortunately. And that's really what takes all the time, is just trying
[02:35.810 --> 02:40.350]  to figure out which of the rules is currently the most probable in order to run. And that's why it
[02:40.350 --> 02:45.330]  takes so much memory, because it's basically building a probability queue of all different
[02:45.330 --> 02:50.130]  probabilities. It's going through... it has a kind of a tree type of a format there. So,
[02:50.130 --> 02:54.690]  long story short, I wish I could figure out how to do that. But where you're probably better off
[02:54.690 --> 02:59.330]  doing is, instead of trying to generate all the guesses in probability order, you could actually
[02:59.330 --> 03:03.710]  just go ahead and generate guesses behind a different probability threshold. So, all these
[03:03.710 --> 03:08.370]  guesses that are higher probability than this level here. And in that case, you could absolutely go
[03:08.370 --> 03:13.950]  ahead and multi-thread it for GPU. So, I've actually talked to some people about potentially
[03:13.950 --> 03:17.590]  getting this in a way so that you're not doing probability order, but you set that threshold,
[03:17.590 --> 03:21.430]  and you could actually run this in the GPU. And it's the fact that GPUs have more memory now,
[03:21.430 --> 03:26.470]  means you can start putting the base grammar inside GPU as well to help speed that up.
[03:27.330 --> 03:32.770]  So, I'm sorry, I should ask, answer, respond to the question here. So, the question was,
[03:32.770 --> 03:39.050]  could I run multiple PCFGs at once on multiple different GPUs? So, the short answer is,
[03:39.050 --> 03:43.470]  theoretically, yes. But I could definitely use some help in actually coding that up
[03:43.470 --> 04:03.420]  with somebody who's smarter than I am. So, does anyone else have any questions here? Or I could
[04:03.420 --> 04:13.480]  just keep on rambling here, too. So, one nice thing is being able to run this against a password
[04:13.480 --> 04:18.360]  list that you're currently cracking, too. Because it does do a pretty good job of stemming that and
[04:18.360 --> 04:24.260]  creating some really interesting input dictionaries for you to be able to use. So, if you're kind of
[04:24.260 --> 04:29.840]  going ahead and doing kind of like a fingerprint attack, just kind of run it around again there,
[04:29.840 --> 04:35.280]  it can actually extract some really useful bits for that to keep on launching against people.
[04:50.170 --> 04:54.970]  So, since I got like nine minutes left here, unless someone else asked a question. Oh,
[04:54.970 --> 04:59.030]  ramble away. Okay, sounds good here. So, the question is, or I guess what I could start
[04:59.030 --> 05:05.550]  talking about is where password crack PCFGs are kind of going in the future. So, one of my big
[05:05.550 --> 05:09.890]  goals is I really would like to be able to get incorporated into Hashcat. I started, you know,
[05:09.890 --> 05:14.490]  brainstorming that quite a while ago. That's part of the reason also why it uses the compiled C
[05:14.490 --> 05:19.210]  version I wrote it in the first place was because Hashcat, for some reason, does not actually run
[05:19.210 --> 05:27.270]  in Python because it's fast, I guess. So, that is, you know, ultimately kind of the end goal
[05:27.870 --> 05:34.250]  for that effort there is to actually incorporate it into Hashcat as a new cracking mode. The
[05:34.250 --> 05:38.550]  initial way I'm going to start looking at is actually having it run as a, Adam calls it a
[05:38.550 --> 05:44.290]  slow guesser mode, which pretty much defines this exactly. So, I want to be able to incorporate that
[05:44.290 --> 05:49.470]  as part of the Hashcat slow guessing modes that you can add in. Eventually, though, I would like
[05:49.470 --> 05:54.990]  to be able to start going ahead and seeing whether I can go ahead and paralyze the way it generates
[05:54.990 --> 05:59.910]  guesses and put into the GPU to make it much, much faster. So, there's other options too that
[05:59.910 --> 06:06.850]  potentially you could basically do... what really slows it down is generating all those rules.
[06:06.990 --> 06:11.650]  So, in an older version, I had actually had the ability to go ahead and pre-compute all those
[06:11.650 --> 06:17.050]  rules and be able to save those to disks. So, you could actually then completely paralyze it if you
[06:17.050 --> 06:21.110]  wanted to as well because there's no rule generation. So, that's another thing I could
[06:21.110 --> 06:26.490]  take a look back at there. The problem is that the rules are really fine grained. So, I almost
[06:26.490 --> 06:32.070]  want to go ahead and, in order to reduce the size of the rules on the disk, is to
[06:32.070 --> 06:37.110]  make the rules much kind of fatter, essentially. So, each rule would correspond to more password
[06:37.110 --> 06:41.890]  guesses. But that's definitely an option that I'm kind of looking at as well, is to make this
[06:42.310 --> 06:53.690]  more feasible to be able to be used. So, is there such a thing as a random PCFG rule? Are they
[06:53.690 --> 07:01.470]  human-readable or, well, geek-readable? So, the short answer is yes. So, if you hit this kind of
[07:01.470 --> 07:07.490]  status output of when you're running a PCFG, a cracking session, you'll kind of print out the
[07:07.490 --> 07:14.370]  text screen of what the current rule set looks like. And it looks a lot like a hashtag mask,
[07:14.370 --> 07:19.070]  actually. So, it just has a few extra different features onto it there. So, you'll see something
[07:19.070 --> 07:26.990]  like the... I use A for alpha string, but A5, so you know it's generating a
[07:26.990 --> 07:31.710]  five-letter word. And it'll have a number next to it saying what the
[07:31.710 --> 07:37.550]  probability, or ranking, I should say, of that five-letter word is. So, it'll say the
[07:37.550 --> 07:43.590]  153rd most probable word, or number one. So, it'd be the first most probable word. So, that'd be
[07:43.590 --> 07:46.830]  password. So, in that case, you can kind of read it, and then you can see what the capitalization mode is.
[07:46.830 --> 07:51.210]  So, it'll say the most common capitalization mode for this here. And it'll say D2,
[07:51.210 --> 07:55.630]  and it'll be a number after that, too. It's the third most common two-letter, or
[07:55.630 --> 08:02.310]  two-digit number. So, you can kind of look at that and see what that rule is. So, because the way
[08:02.310 --> 08:08.870]  that it happens there, you could absolutely go ahead and create a random rule for that.
[08:08.870 --> 08:12.990]  Or you can just kind of look at it and read what it's actually doing. And so, you can go back to
[08:12.990 --> 08:17.910]  the grammar and say, okay, what is actually the 143rd most probable word? And it's, okay, so that's
[08:17.910 --> 08:22.930]  the word that it's using. And if the words have the same probability, they'll use just all of them
[08:22.930 --> 08:26.590]  there. So, there might be multiple different words that you only saw in the train set, let's say,
[08:26.590 --> 08:33.090]  twice. So, they'd all be used to the same probability in that rule. You can also create, when talking
[08:33.090 --> 08:37.510]  about random passwords, some work I did in the past was creating what's called honey words. So,
[08:37.510 --> 08:43.550]  these are passwords that look like human passwords. And PCFGs are really good for that. So,
[08:43.550 --> 08:47.050]  instead of trying to create the most probable password, you do a random walk through the
[08:47.050 --> 08:50.710]  grammar. And that way, you know, 1, 2, 3, 4, 5, 6 is still going to be very common. You're going to
[08:50.710 --> 08:54.210]  create that much more often than normal. But you also create some kind of random looking passwords
[08:54.210 --> 08:58.570]  too. And where this is really useful, for example, is if you're trying to set out a honeypot server,
[08:58.570 --> 09:04.070]  and you want to do like a honeypot Active Directory controller, and you don't want to go
[09:04.070 --> 09:10.250]  ahead and manually create a thousand different passwords for different fake users. So, that way,
[09:10.250 --> 09:14.890]  you can go ahead and run that. And you create passwords that, you know, if you glance at it,
[09:14.890 --> 09:18.390]  it looks somewhat real. You're still seeing like the really common passwords. You might see the
[09:18.390 --> 09:22.970]  company names, you know, the most common there. But you're still seeing some random other stuff
[09:22.970 --> 09:28.290]  as well. So, that's definitely some work. And improving upon that is I'd like to be able to
[09:28.290 --> 09:33.350]  create one that or a modification that where it has a biased random walk. So, that way that you
[09:33.350 --> 09:36.830]  can create passwords that look like they all came from the same user, but once again are, you know,
[09:36.830 --> 09:41.410]  somewhat different. So, that's definitely an area that's kind of fun to play around with.
[09:50.400 --> 09:54.840]  What have you seen in terms of variance between, say, company one and company two? So, that's just
[09:54.920 --> 10:02.420]  a question that was asked. And a short answer is I really haven't seen that. And by that, I mean,
[10:02.420 --> 10:06.600]  I'm a researcher. This is a hobby for me. So, almost every single dump that I've worked with
[10:06.600 --> 10:11.980]  has been, I would say, the public dumps. And while there definitely are company passwords that,
[10:11.980 --> 10:18.900]  you know, get pushed out there, especially if you look for, like, NTLM. But I'm a researcher. So,
[10:18.900 --> 10:22.960]  you'd actually have to ask, you know, someone like, you know, CoreLogic about that there. So,
[10:22.960 --> 10:25.900]  I'm not a really good person to answer. And that's why I'm always really interested in hearing these
[10:25.900 --> 10:29.680]  talks and talking to people who are actually doing this professionally. Because they're able to
[10:29.680 --> 10:33.640]  provide me that information there. But pretty much all I have from that is secondhand.
[10:39.700 --> 10:42.840]  So, we're getting to the end here. It looks like there's just a couple more minutes. So,
[10:42.840 --> 10:48.500]  if anyone has any questions, feel free to ask them. Otherwise, I really do appreciate you
[10:49.000 --> 10:53.500]  coming in and, you know, viewing this talk. As I said, I'm available on Twitter,
[10:54.440 --> 11:00.600]  at L-A-K-I-W. And if you really want to get in touch with me, the best way to do it is
[11:00.600 --> 11:07.100]  submit an issue to the GitLab repo. That's probably the best way. Because I pretty much
[11:07.100 --> 11:11.040]  obsessively track that there. And I'll definitely try to get back to you. But this
[11:11.040 --> 11:14.320]  is something that I find is really fun. And I really do appreciate you taking an interest in
[11:14.320 --> 11:27.670]  this. Cool. Yep. And I'll be on Discord for the rest of the weekend, too. So,
[11:27.670 --> 11:29.830]  if anyone has any questions, feel free to pop in there, too.
